ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / tsql / doc / tsql.mail / 000120_csj@iesd.auc.dk _Thu May 13 16:16:48 1993.msg < prev next >

Wrap

Internet Message Format | 1996-01-31 | 4KB

Received: from iesd.auc.dk by optima.CS.Arizona.EDU (5.65c/15) via SMTP id AA08651; Thu, 13 May 1993 07:16:48 MST Received: from yellow.iesd.auc.dk by iesd.auc.dk with SMTP id AA03200 (5.65c8/IDA-1.5/MD for <tsql@cs.arizona.edu>); Thu, 13 May 1993 16:16:48 +0200 Date: Thu, 13 May 1993 16:16:48 +0200 From: "Christian S. Jensen" <csj@iesd.auc.dk> Message-Id: <199305131416.AA03200@iesd.auc.dk> To: tsql@cs.arizona.edu Subject: Re: Benchmark Query Taxonomy Dear Fabio, I thank you for your consistent interest in the benchmark. Below, I attempt to provide an adequate answer to your problem. As you, I hope this reply may also benefit other contributors. The query to be categorized is Find the date of birth of *ED*. I will categorize the query according to the taxonomy as summarized in Figure 6 of the benchmark document. When that is done, classification according to the coarser Figure 7 is trivial. Output: I assume that the expected answer to the query is "7/1/55." In that case there is an explicit-attribute component in the output. Compared with the argument relation schema, the component is "projected," i.e., attributes such as Name and Salary are missing. Next, there is no valid-time component. The D-birth attribute is a user-defined time attribute. So far we have "(Projected, None)." Classifying the query with respect to v-t selection provides me with an opportunity to make a more general observation: Natural language language queries are notoriously imprecise compared with queries expressed in a temporal query language. Since the taxonomy comes from temporal query languages, the taxonomy is also "precise." This does make the classification of natural language queries a challenge. A good rule seems to be that "if it is hard to categorize a query then make it more precise." Specifically, is there a hidden valid-time selection criterion in the query. If yes, what is that criterion? In a query language we would typically be forced to specify when the birth date is supposed to be valid (often the default is the current time). Thus, I'll assume that the query is Find the current date of birth of *ED*. Since the birth date does not change, we know that the birth date valid at any other time is the same. Now, there is a valid-time selection. Specifically, we select a date of birth if the associated valid time contains the event with the current time as value. This gives (Containment, Event, Explicit). Considering non-temporal selection, it seems clear that we have an equality predicate saying that the name attribute value should be the (current) name of *ED*. This yields (=, Constant). As an aside, I would probably phrase the query with an explicit reference to the name of *ED*. I would ask for the date of birth for the person who currently is named Edward. The final result is (Projected, None) / (Containment, Event, Explicit) / (=, Constant) I chose to address the more specific questions in your message only indirectly because I did not understand them fully! Let me illustrate my problem. > With reference to the last benchmark draft, > should the output of such queries be classified > as: (a) - (Projected, None) > or as: (b) - (None, (*, "value") ) ? The taxonomy has three parts, separated by "/"'s. No part has nested parenthesis. Thus, neither (a) nor (b) are syntactically correct. Should (a) be "(Projected, None)//"? What is (b)? I am sorry that I was not able to follow your thoughts...I am eager to learn more. I think we will all identify problems with the database schema, the instance, and the taxonomy when we propose queries. I know that I have already. But I do not think it is realistic to change those parts and, at the same time, expect a finished benchmark before the TDB workshop. It will be impossible for contributors to keep up with the changes, and queries will have to be rewritten or recategorized every time a change is made. Can we even arrive at consensus fast enough to add changes? This is an experiment. The next benchmark can benefit from the lessons learned here. Best regards, Christian csj@iesd.auc.dk